Unsupervised Part-of-Speech Induction
نویسنده
چکیده
Part-of-Speech (POS) tagging is an old and fundamental task in natural language processing. While supervised POS taggers have shown promising accuracy, it is not always feasible to use supervised methods due to lack of labeled data. In this project, we attempt to unsurprisingly induce POS tags by iteratively looking for a recurring pattern of words through a hierarchical agglomerative clustering process. Our approach shows promising results when compared to the tagging results of the state-of-the-art unsupervised POS taggers.
منابع مشابه
Evaluating Unsupervised Part-of-Speech Tagging for Grammar Induction
This paper explores the relationship between various measures of unsupervised part-of-speech tag induction and the performance of both supervised and unsupervised parsing models trained on induced tags. We find that no standard tagging metrics correlate well with unsupervised parsing performance, and several metrics grounded in information theory have no strong relationship with even supervised...
متن کاملA Universal Part-of-Speech Tagset
To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-ofspeech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. As a result, when combined with the original treebank data, this universal tagset and mapping pro...
متن کاملOnline EM for Unsupervised Models
The (batch) EM algorithm plays an important role in unsupervised induction, but it sometimes suffers from slow convergence. In this paper, we show that online variants (1) provide significant speedups and (2) can even find better solutions than those found by batch EM. We support these findings on four unsupervised tasks: part-of-speech tagging, document classification, word segmentation, and w...
متن کاملAddressing Ambiguity in Unsupervised Part-of-Speech Induction with Substitute Vectors
We study substitute vectors to solve the part-of-speech ambiguity problem in an unsupervised setting. Part-of-speech tagging is a crucial preliminary process in many natural language processing applications. Because many words in natural languages have more than one part-of-speech tag, resolving part-of-speech ambiguity is an important task. We claim that partof-speech ambiguity can be solved u...
متن کاملPart of Speech Induction using Non-negative Matrix Factorization
Unsupervised part-of-speech induction involves the discovery of syntactic categories in a text, given no additional information other than the text itself. One requirement of an induction system is the ability to handle multiple categories for each word, in order to deal with word sense ambiguity. We construct an algorithm for unsupervised part-of-speech induction, treating the problem as one o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1801.03564 شماره
صفحات -
تاریخ انتشار 2018